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Abstract: A common problem in the sciences is that a signal of interest 
is observed only indirectly, through smooth functionals of the signal whose 
values are then obscured by noise. In such inverse problems, the functionals 
dampen or entirely eliminate some of the signal's interesting features. This 
makes it difficult or even impossible to fully reconstruct the signal, even 
without noise. In this paper, we develop methods for handling sequences 
of related inverse problems, with the problems varying either systemati- 
cally or randomly over time. Such sequences often arise with automated 
data collection systems, like the data pipelines of large astronomical in- 
struments such as the Large Synoptic Survey Telescope (LSST). The LSST 
will observe each patch of the sky many times over its lifetime under vary- 
ing conditions. A possible additional complication in these problems is that 
the observational resolution is limited by the instrument, so that even with 
many repeated observations, only an approximation of the underlying signal 
can be reconstructed. We propose an efficient estimator for reconstructing 
a signal of interest given a sequence of related, resolution-limited inverse 
problems. We demonstrate our method's effectiveness in some representa- 
tive examples and provide theoretical support for its adoption. 

Keywords and phrases: deconvolution, ill-posed, image processing, sig- 
nal recovery. 



1. Introduction 



In many applications, data about a signal of interest can only be indirectly 
gathered. For instance, astronomical images from ground-based telescopes are 
observed through the blurring caused by atmospheric turbulence; Positron Emis- 
sion Tomography (PET) scanners measure photon intensities averaged over 
lines; and seismologists record the surface effects of earthquakes whose waves 
have been filtered by the Earth. In these examples and other such inverse prob- 
lems^ the basic measurements are smooth functionals of the signal that dampen 
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or entirely eliminate some of the signal's interesting features. This makes it dif- 
ficult, or sometimes impossible, to fully reconstruct the signal from noisy data. 

Over the years, a number of methods have been developed for the recovery 
of a signal under inverse problems. We cannot hope to provide a comprehensive 
list, but see O'Sullivan (1986); Wahba (1990); Donoho (1995); Tenorio (2001); 
Candes and Donoho (2002); Cavalier et al. (2002) and the references contained 
therein for an introduction and Cavalier (2008) for a modern review of the 
state of the field. Also, many disciplines have developed specific techniques for 
addressing particular issues, such as Astronomy (Starck, Pantin and Murtagh, 
2002; van Dyk et al., 2006) and Tomography (Olafsson and Quinto, 2005). 

However, the above cited work provides techniques and theory for situations 
in which an estimate of a signal is formed after only one observation. In many 
fields, recent technological advances have made it possible to automate data- 
collection, enabling repeated observations of the signal over time. While repeated 
observations can improve accuracy, it often raises new challenges as the inverse 
problems faced at different times can vary significantly. For example, the Large 
Synoptic Survey Telescope (LSST), a multi-year. Earth-based survey of the 
entire sky, will image space to an unprecedented depth and will catalog billions of 
astronomical objects. The LSST will take long sequences of images at each patch 
of sky, about 3 degrees on a side. In each sequence, the images will be separated 
in time by approximately 3-4 days. Each image in each sequence is taken with 
different blurring and distortion conditions. Thus, the viewing process represents 
sequences of related but distinct inverse problems. One scientific goal is to use 
these images to reconstruct the signal, which in this case is comprised of the 
underlying celestial structures, as accurately as possible. 

Notationally, we consider the following problem. We want to recover infor- 
mation about an unknown signal 9 G W from measurements of the form 

Y,^K,e + eW„ for t = l,2,.... (1) 

Here, each is a measured signal, such as an audio recording or a (vectorized) 
image represented as a p x 1 vector. Each forward operator Ki describes the 
measurement process and the W^'s are independent, mean zero Gaussian p- 
vectors with variance-covariance matrix Ip, the order p identity. 

The Ki represent both the damping of the signal present in an inverse problem 
and the necessary discretization due to the resolution-limited nature of most 
observational devices, most commonly through pixelization. The i^^'s are a priori 
unknown and hence must be measured and estimated. As any information about 
the Ki^s comes from the observational device itself, any estimate of the Ki's are 
resolution-limited as well. Therefore, we represent the measurement process Ki 
as a p X p matrix. This captures the idea that, in many problems, the resolution 
is fixed by the instrument and does not change as more data is collected (that 
is, as n — >■ oo). 

An early formal consideration of the sequential inverse problem is found in 
the literature on developing loss-less analogue-to-digital conversion techniques. 
The recovery of the orginal, analogue signal is an inverse problem as there 
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is not a unique analogue signal eorresponding to each digital signal. This re- 
sult is formalized in the quantity referred to as the Nyquist rate, or frequency 
(Mallet, 2009, Chapter 3). If the signal is instead sampled multiple times at 
different, carefully chosen sampling rates, Berenstein and Patrick (1990) and 
Casey and Walnut (1994) find conditions under which the original signal can 
be reconstructed in a loss-less way. Note that, as opposed to our paper, these 
approaches deal with only the case where e = and the Ki, which correspond 
to the sampling rate, can be chosen by the experimenter. 

Subsequently, the sequential inverse problem is considered in a series of ar- 
ticles, beginning with Plana and Bertero (1996), in which two methods are in- 
troduced. The first corresponds to Tikonov-Phillips (TP) regularization (known 
in statistics as ridge regression) adapted to the sequential problem. The second is 
an iterative method based on Landwieber iterations (LI). See Bertero and Boccacci 
(1998) for an overview of the Landwieber iterations method in inverse problems. 
Though the above methods have been successfully implemented in the past, 
most notably in the software package AIRY (Correia et al., 2002), it has two 
shortcomings: the methods correspond to restrictive choices among all possible 
estimators and they offer no automated method for choosing the introduced 
tuning parameters. 

Remark 1.1. The Tikonov-Phillips and Landwieber Iteration methods can 
readily be derived by the formalism developed in this paper (see equations (19) 
and (17)). Therefore, we get for free a principled method for setting the tuning 
parameters, as well as a suite of new estimators. 

The goal of this paper is to develop and investigate a statistically efficient 
estimator of 9 from the sequence of resolution-limited inverse problems intro- 
duced in equation (1). We require that any estimator must satisfy the following: 
(i) it leaves no user-defined tuning parameters and (ii) the estimator 0„ based 
on an n-sequence can be efficiently updated to produce the estimator 6n+i after 
observing Y„_|_i. Both requirements are particularly important in applications 
like the LSST, where it is inconvenient (or impossible) to access the entire past 
data stream with each new observation and hence the data must be processed 
in near real time. 

In Section 3.3 we discuss two reasonable approaches to this problem based 
on collapsing the sequence of operators (Ki) into one summary operator, in one 
case averaging the operators and in the other concatenating them. Wc show 
that both approaches do not satisfy conditions (i) and (ii). 

Satellite Imaging: To fix ideas, we introduce a typical instance where the 
observations form a a sequence of resolution limited inverse problems. During 
satellite imaging operations, a location on Earth is imaged many times over the 
life span of the satellite. The quality of the recorded observations can be low and 
variable due to changing atmospheric and/or weather conditions. See the left 
column of Figure 1 for a representative panel of four such images taken of the 
White House and surrounding buildings. Note that the amount of blurring in 
each image i, corresponding to the forward operators Ki, can be very different. 
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However, the pixelization induced by the observational device is fixed over the 
sequence of images. 

Our proposed estimator 9n takes these images and sequentially creates a new 
estimate of the unknown signal 9 after each observation Yi (right column of 
Figure 1). Each row of Figure 1 is a new observation and 9n after being updated 
with that observation. Notice that the recovery is quite good, even after only a 
few images as input. We emphasize that there are no choices to be made by the 
data analyst: all tuning parameters are chosen in an automatic, data-dependent 
way. 

This paper is organized as follows. In Section 2 we give a careful overview 
of our method and provide justification for the assumptions made, with greater 
exposition occurring in Appendix A. In Theorem 2 and Theorem 3, we give sup- 
porting theory for our estimator that both shows uniform consistency over the 
parameter space and an asymptotic oracle inequality. These results show both 
that our estimator will get the correct answer eventually, no matter the signal 
in our parameter space, and that our estimator makes essentially as efficient use 
of the data as if we knew the signal 9. In Section 4, we provide an example of 
our framework in action on simulated data. 

Notation. For A <E C^^p and a G C, define A* to be the Hermitian adjoint 
of A. Correspondingly, define |ap = a*a and |Ap = A* A to be the squared 
complex modulus of a scalar and matrix, respectively. Likewise, for any vector 
X e C'', ||a;|p = x*x. If AA* = Ip = A* A, then we say that A is unitary. We 
utilize bold faced font for vectors: b„ G C with its j*'* entry notated 6„j and 
the subscript n indicates dependence on the sample size. Similarly, Anj is the 
j*'' element of the main diagonal of the matrix An. We abuse notation slightly 
by using A as both a vector in and as a function from to given by 
component-wise multiplication. 

2. Methodology and main results 

We begin this section by making the following assumptions, building on the 
notation introduced in equation (1): 

(Al) The noise parameter e > is known. 

(A2) The (Ki)"^-^ are known smoothing matrices. 

(A3) There exists a unitary matrix 5* e C^^p and diagonal matrices Di such 

that Ki = ^Di^* for ah i = 1, . . . , n, . . .. 
(A4) There exists an < oo such that for all j there exists an 1 < < such 

that IA.il > 0. 
(A5) Define A„ := lAjP- Then the (A) arc such that 

maXjA„j 
lim — r^^— - — - < oo. 

n->cx) miUj Anj 

Assumptions (Al) and (A2) are very standard in the statistical inverse prob- 
lem literature. We discuss a strategy for estimating e in Section 3.2. Assumption 
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Observations Estimates 



Fig 1: Example of images of the White House from a satehite and associated 
recovery of the unknown signal 9 using our proposed estimator. In the left 
column (Observations) the different amounts of blurring are due to varying 
atmospheric conditions and correspond to the forward operators Ki in equation 
(1). In the right column (Estimates) we report the output of our estimator using 
the data in the left column. Each row corresponds to making another observation 
Yi and updating our estimator with this new data. We emphasize that there are 
no choices to be made by the data analyst; all tuning parameters are chosen in 
an automatic, data-dependent way. 
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(A4) is also commonly made and it ensures that, at some point, the entire signal 
6 is identified and loosely corresponds to the intersection of the null spaces of 
the {Ki)2^^ eventually only containing the zero vector. Assumption (A5) merely 
prevents a pathological case where the Ki are becoming more ill-conditioned 
without bound as n — > oo. Assumption (A3) is crucial to our method and while 
the reason for it will become clear, the following theorem provides a general 
family of matrices that satisfy it: 

Theorem 1. // the {Ki)"^i all correspond to the convolution operation, then 
there exists a unitary matrix ^ and a sequence of diagonal matrices (-Dj;)"=i; all 
of which could have complex entries, such that (A3) holds. If 9 is a one (two)- 
dimensional signal, then the Ki are (block) circulant and the entries of the 
matrix 5* are the discrete one (two) -dimensional Fourier basis and the entries 
of Di are the corresponding discrete one (two)-dimensional Fourier coefficients. 

Hence, we see that (A3) is more general than the convolutional assumption 
made in Piana and Bertero (1996) and many other works concerning statistical 
inverse problems. See Appendix A for a proof of Theorem 1 and an investigation 
into more general families of matrices that satisfy assumption (A3). 



2.1. Overview and main results 



An overview of our procedure is as follows. The parameter 9 and each ob- 
servation Yi us rotated by The rotated Y^'s are combined together to 
form a sufficient statistic B„. The estimators we consider are of the form 
9 = \E'A(B„) := '^{XjBnjYj^i- Define this set of estimators to be 

£ = {9 = *A(B„) : A € C^}. (2) 

We choose from the estimators in £ using a combination of minimizing an empir- 
ical estimator of the risk and some additional regularization parameters. Define 
our estimator to be 0„ = ^'A(_B„), where 

The form of this estimator is derived in the text containing and preceding equa- 
tion (20). We set nl := {p - 2) (l + ^f^f^) ■ Note this choice of ^1 is moti- 
vated by Brown, Nie and Xie (2011) in which it is shown that ensemble mini- 
maxality in the heteroscedastic case holds for the soft thresholded James-Stein 
type estimator. 

We define our loss function to be the P norm with associated risk 

R{(),9) -.^EWO -9\\^ (4) 
and set 6 := {6* : He'H^ < T^} for any < < oo. Then 
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Theorem 2. Under assumptions (Al) - (A5), 

limsup sup 7,~^i? [On-, 9] < C < oo (5) 

where 

7„ = max- — . 

If Dij = Dj for some Dj e C, then 7„ x 1/n; that is the parametric rate. 
However, the forward operators (Ki) in effect ensure that each observation 
doesn't decrease the risk equahy. The quantity A„j relates to how much in- 
formation is present in the first n observations about the j*'' component of 

Additionahy, we can compare our estimator to the £-oraclc 6^ 
Theorem 3. Suppose assumptions (Al) - (A5) and let 

R{e^,e) mini?(^,e') 

be the risk of the £ -oracle. Then 

R{0n,O^ <R{0,,0){1 + O{1)), (6) 

where the term 0{1) does not depend on 9. 

An interesting extension of this model is to the random operator setting. 
That is, what is the impact of having Ki being drawn from some distribution? 
We answer this question in an interesting case. 

2.2. Random Eigenvalues 

Suppose that the (Ki) are random operators such that Ki = 'i'Di'^* for all 

i = 1, 2, . . . and diag(Z?i) '^"^ I?, where V is any p-variate complex distribution 
that doesn't have too much mass near zero. Specifically, 

(B4) The distribution V is such that there exists an a where for < t < a 

(I < r) = (rf. 

This is a stochastic extension of assumption (A4) as it allows the random 
eigenvalues to be arbitrarily close to zero in magnitude but with the probability 
of them being small going to zero at an appropriate rate. Lastly, let (Wj) and 
{Di) be mutually independent. 

Theorem 4. Suppose assumption (B4-) holds with some p> 1. Then 

2 

lim supE(£,^),(Y ) 6'„-6i =0 (7) 

where Ej-^).) (y.^) corresponds to integration with respect to the joint distribution 
of [Di) and (Fy). 
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2.3. Rotations, estimators, and tuning parameter selection 

Returning to equation (1), for i = 1, 2, . . . we define Xj := **Yj, (3 := and 
Zi := ^*W,. Then it follows that 

X, = A/3 + eZ,. (8) 

Note that in this case Zi '^'•^'^ CN{0, Ip, ^I'^E'^)^. It is also convenient to look at 
equation (8) component- wise, 

X,, =A,/3,+eZ,, (9) 

for j = 1, . . . ,p. Note that for these multiplications to be defined, we have to 
think about W being embedded in C^' by having imaginary part equal to zero. 
Wc follow this convention without comment in what follows. 

Remark 2.1. Note that the (Z,) are degenerate complex Gaussian vectors in 
the following sense: if we think of a p dimensional complex Gaussian as a 2p 
dimensional real Gaussian with some covariance matrix, then the Gaussian ac- 
tually has values in a p dimensional subspace of M^^. Thus the random variables 
don't have a density with respect to Lebesgue measure on the full space C^, 
among other complications. 

Remark 2.2. Commonly, the sequence space formulations found in equation 
(8) and equation (9) are accomplished by a real, orthogonal matrix instead of 
a complex, unitary one. Allowing for the sequence (J^'OiLi to share the same 
eigenvectors necessitates permitting to be complex. This makes equation (9) 
more complicated than the conventional normal means problem in at least two 
ways. First, as stated above, the random variables are complex. Second, and 
more importantly, the model is heteroscedastic. This leads to a much more 
involved theory than in the homoscedastic case, such as in Brown (1975), and 
is still the topic of contemporary research (Brown, Nie and Xie, 2011). 

Lastly, define 

Bnj ■■= I '.12 = Pi + £K] (10) 

where A„j := Yh=i 

This quantity is particularly important, as evidenced by the following theo- 
rem 

Theorem 5. Under the model introduced in equation (1) and (Al) - (A4), the 
random vector 'Bn '■= {BnjYj^i sufficient for (3 in equation (8). 

This claim can be seen by noting that the map $* is measure preserving. 

complex normal has an extra parameter compared with a real normal. For a zero mean 
complex normal random variable Z, this is denoted CA''(0, EZZ* , EZZ^). 
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As 4* is also unitary, we can define an equivalent risk to the one defined in 
equation (4) in terms of /? 

R{d,9) ■.= E\\9-e\\^ =E\\'f*{§-9)\\^ =E||/3-^|P =: (11) 

Any risk computations made under the data, which is {Xi)f^-^ in our nota- 
tion, are equivalent to those made under a sufficient statistic (Bahadur, 1954, 
Theorem 7.1). By Theorem 5, B„ is sufficient for /? and hence for all measurable 
functions of the data that are not functions of B„, there exists an estimator with 
equal risk that is a function of B„. In fact, the expectations in equation (11) 
are equivalent under {Xi)f^i and B,i. Therefore, for each n, we can treat B„ as 
the data and formulate estimators based upon it. 

To develop an automatic procedure for signal estimation in sequential in- 
verse problems we begin by regularizing an unbiased estimator of (3 through the 
use of a smoothing parameter vector. We choose this smoothing parameter by 
minimizing an estimate of the risk. This type of procedure, known generally as 
unbiased risk estimation, has been revisited regularly in many fields for solv- 
ing various problems related to denoising (Stein, 1981; Donoho and Johnstone, 
1995). However, as inverse problems generally result in unstable estimators of 
both the parameter /3 and the risk R, we compensate by including additional 
regularization. 

The specifics of our approach are related to the procedure found in Beran 
(2000). However, the goal in Beran (2000), unlike our paper, is the estimation 
of the regression function in an assumed linear model instead of the coefficients 
themselves. That is, referring to the notation in equation (1), the estimation 
of KiO instead of the estimation of 0. This is an important distinction as both 
estimating 9 is intrinsically harder than estimating Ki6 and 6 is the object 
of actual interest. The practical implications of these differences is that only 
minimizing an unbiased estimate of risk, as is the procedure in Beran (2000), 
provides insufficient regularization. As well, the theoretical justification that 
appears in Beran (2000), is essentially entirely asymptotic in p. This is a regime 
we do not consider relevant for the problem at hand. 

To begin to formulate an estimator of /3, and therefore 9, we state the follow- 
ing: 

Proposition 6. Define tl'j := (|i?„jp— e^/A„j)/|_B„jp. Then the random func- 
tion 

i?„(A):=^(A,-V',)'|S„,f (12) 

provides, up to a constant independent of A, an unbiased estimate of R{\). 
Additionally, 

min R(X) = mini?(A) (13) 

where C = [0, 1]^ is the p dimensional hypersquare. 
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The first part of tlic proposition provides an unbiased estimate of the risk 
while the seeond part implies that we gain no improvement in risk by allowing 
A to have values outside of £. 

Using Rn from (12), define for any Q C 

A := argmin _R„(A) (14) 
g 

whieh produces an estimator of (3 via 

/3^:=A^(B„). (15) 

Lastly, we recover an estimate of 9 by forming 9^ := ^j3^ . 

As any choice of Q results in an estimator via the above machinery, there 
are in principle many possible choices. We focus on Q = C, which by inspection 
of equation (12), results in 

where as usual (•)+ = max(-,0) is the soft thresholding function. Other choices 
can and should be explored in further research into estimation in sequential 
inverse problems such asA^ := {As C : Ai > A2 > ... > Xp}, which induces 
a monotonicity constraint on the estimated coefficients, or block methods of 
piecewise constant weights (Cavalier and Tsybakov, 2002). 

Additionally, the aforementioned Tikonov-Phillips regularization and Land- 
wieber iterations methods correspond to specific subsets of £. The Tikonov- 
Phillips estimator takes the form 

which can be rewritten as an element of £ by defining 

X] := -r^^ (18) 

with associated estimator (3^ — A'''(Bn). 

The Landwicber iterations estimator is by nature iterative. However, it has 
an equivalent formation in the form of the following linear smoother 

A^"' = (l-[l-rA„,r) (19) 

where 7 corresponds to the number of iterations and r is a relaxation parameter. 
The associated estimator is /3(T''^) = A'-''''^-'(B„). Hence, this procedure gener- 
alizes the results in Plana and Bertero (1996) by providing a principled tuning 
parameter selection method. 
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A problem arises if wc choose smoothing parameters in this fashion in in- 
verse problems: insufRcient regularization. This is due to i?„ being an unstable 
estimate of R for the same reason as B„ is an unstable estimator of (3. 

Instead of regularizing the risk estimator, we modify the weights directly to 
provide additional regularization. However, we record our belief that regularizing 
Rn by limiting how ill-conditioned the risk estimator can become and then 
minimizing this biased estimator of the risk should provide a suite of interesting 
estimators via the above machinery. Define 

where the parameter $7^ is specified before Theorem 2. Lastly, define our esti- 
mator of 6 to be 

0n *A(B„). (21) 



3. Computational concerns, variance estimation, and alternate 
methods 



3.1. Computations 

- g 

The specifics of the computation of an estimator theta depend on the sub- 
set Q. However, i?„ is a convex objective function. Hence, if is a convex 
subset of MP, then the solution can be found both efficiently and uniquely. 
Of the estimators mentioned above, all except A4 have a closed form solution 
and therefore trivial computation. The minimization of Rn over M can be ac- 
complished by a well known algorithm called Pooled Adjacent Violators (PAV) 
(Robertson, Wright and Dykstra, 1988) that transforms the least squares so- 
lution into the monotone solution by taking weighted averages of adjacent 
elements of 4' that violate the monotonicity constraint. 

Additionally, in the case of convolution, the vector A„ and the random vari- 
ables (Xi) can be computed via the Fast Fourier Transform, which implies 
0{plogp) computations and is of course the archetypal instance of an efficient 
algorithm. However, for more general matrices Ki, the eigenvectors must be 
computed via a conventional eigenvector solver, which necessarily has computa- 
tional complexity 0{p^). This could become prohibitive for large scale problems. 
There do exist modern approximation methods for eigenvalues and eigenvectors 
that could be used instead, such as in Halko, Martinsson and Tropp (2009). 
However, we do not explore this idea further in this paper. 

An additional feature is that for the computation at step n, it is not necessary 
to keep the entire history (i^)f^i and (ifi)"^i. Both Bn and A„ can be computed 
from aggregate information. Hence, we can produce an estimate of 9 given only 
access to a few summary statistics which get updated after each new observation. 
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3.2. Estimating the variance parameter 

Estimating the variance parameter can be accomplished in a consistent way by 
setting aside a subsequence A/" of N and computing the estimator 

Here, we have computed the estimator after the first n' entries in M . 

Ahcrnativcly, we can take advantage of the observational process to acquire 
a good estimate of e. As we make observations, occasionally some will be of 
exceptionally poor quality. This observation will be less helpful for recovery in 
general and provide almost no information about the higher order components 
of the vector /3. 

Suppose now that M is the set of all indices i such that Yi is a low quality 
observation; that is there exists a p' such that for j = p' , . . . ,p, the li^ijp are 
small. In general, p' could depend on i, but wc do not consider this complication 
here. Form the following statistic 

(23) 

9=P' 

Then Eef = e^ + j^ Y.q=p' lA^Pl^jP and we report 1/n' Y^t^Af as our es- 
timator of e^. This is in general a biased estimator of the variance. Nevertheless, 
it is still useful. First, it is conservative owing to its positive bias. Perhaps more 
importantly, this estimator provides an interesting situation where the lowest 
quality parts of the lowest quality observations provide the best performance. 



3.3. Averaging is not enough 

In equation (1), conventional statistical practice would suggest averaging the 
observations (K;) directly. However, we show here that this leads to suboptimal 
results. Specifically, averaging gives thes following model 

Yn = Kne + ^W (24) 

where, under assumption (A3), Kn ^ ^Dn'^* , 

Dr, := l/nJ27=iDi, F„ := l/nJ^'^^^^Yi, and W - ^(0,/p). This can also be 

equivalently expressed as 

B„ = \Dn I -^D*JCn = P+^\Dn\ A>* I^. (25) 

Here, X„ = 5'*i^n. We define the corresponding set of linear estimators to be 
£:^{§ = 4'A(B„) : A e C^}. 
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Note that wc can write equation (24) without any assumptions about the 
eigenvectors of the forward operators. However, under assumption (A3), the 
following theorem supports forming estimators based on equation (10) instead 
of equation (24). 

Theorem 7. Suppose for a fixed 6, 

i?i = inf E||^-6'||^ and i?2 = mf E||^ - 6i||^, 

where the expectations in Ri and R2 are under B„ and B„, respectively. Then 

Ri <* R2 

where '<* ' means 'strictly less than except when Di = D for all i and some D. ' 
That is, the oracle linear risk based on equation (10) is strictly less than the 
oracle linear risk based on equation (24) . 

Remark 3.1. Note that the classic Tikonov-Phillips estimator based on the 
Yn is of the form 

bridge = (kIKu + Tiy^KlYn. 

This is equivalent to 

^■idgc = ^{\D^\'' + Tl)-^\Dn\''\Dn\~^D*Ji,, = *(|:D„|2 + r/)-i|;D„|2B„, (26) 

and hence the Tikonov-Phillips estimator is in £, among many others. 

An alternative approach relies on forming /C„ := [KJ , . . . , K^]^ , y^i := 
[Yi^, . . . , Y^Y, and Wn ~ N{Q, I„p). Then it follows that 

yn=ICne + eWn. (27) 

However, estimators based on this approach, such as spline type estimators, rely 
on accessing the entire history of observations (Yi) and forward operators (Ki). 
This is computationally infeasible as this means both keeping and repeatedly 
accessing the entire sequence of observations. Hence, this approach doesn't sat- 
isfy our requirement of an estimate at time n being efficiently updatable to a 
new estimate after recording 

4. Supporting simulations 
4-1. Description 

In this section we present visual results of using our estimator 0„ to reconstruct 
various signals, given access only to smoothed and noisy, but repeated, observa- 
tions of that signal. In both cases, we compare our estimator, 0„ to bridge from 
equation (26), with the smoothing parameter r chosen by minimizing generalized 
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Fig 2: The left column corresponds to 0^™°°^^ and the right column corresponds 
to gp'^^^'^'^. The top row is a plot of the signal itself, along with the signal after 
the minimum (dashed line) and maximum (dashed and dotted line) amount of 
smoothing. The bottom row is an example of the recorded data after corruption 
by smoothing and noise. Notice that in smaller peak is completely 

obscured. 



cross validation (GCV). For a quantitative comparison, we use the normalized 
relative risk (RR) given by 

We estimate RR by averaging 100 runs of our simulations. 

For each of the signals introduced below, we set p = 256 and fix the noise 
parameter e to be such that the signal-to-noise ;= ||^^||l/(pe) — 1. For these 
examples, we admit Ki that arc an equally weighted mixture of three Gaus- 
sians, normalized to have h mass equal to 1, with means /ii = —0.75, /Z2 = 
0.00, and = 0.50, along with standard deviations aiq ~ 0.5 -I- Eqi^ where 

Eqi exponential(l) and q = 1,2,3. Note that this implies that the Ki are 
not symmetric. Also, note that Gaussian-like smoothing represents one of the 
worst cases as it exponentially down- weights the (3j for large j. 

We consider two signals for estimation, which we refer to as 6''''°°°*'^ and 
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^peaked (piguj-Q 2). The first signal, 6)™^°°'^^ g^^-^ of two Gaussians functions 
that are filtered by a Gaussian-tapered filter. This filter is additionally enforced 
to be zero above the p/2 frequency. Hence, 0™°°"! is very smooth and compactly 
supported in the frequency domain. This example is instructive as a smooth 
function should be well represented by the eigenvectors 'i> of the smoothing 
operators Ki. Also, a compact representation in frequency domain will reveal 
the effectiveness of the soft-thresholding in zeroing out the appropriate Bnj , ie: 
those that correspond to the /3j that are zero. See the left column of Figure 2 
for a plot of 0^"^°°^^ (top) along with a typical example of a noisy, smoothed 
version that comprises the recorded data (bottom). 

Additionally, we consider the opposite situation by defining a signal 0P<=3.kod 
that is the sum of three sharp, non-smooth, peaks. This signal is difficult to 
represent with the eigenvectors of smoothing matrices but is common in sig- 
nal processing as it corresponds to both spectra from biochemical analysis and 
nuclear magnetic resonance imaging (nMRI). Note that the smallest peak is 
completely obscured by the smoothing and noise. See the right column of Fig- 
ure 2 for a plot of 0P<=3.kcd j^^^Qp^ along with an example of a noisy, smoothed 
version (bottom). 

4.2. Results 

In estimating either signal, 0^"^°°*-^ or QP'^^^'^'^^ the estimator On converges rapidly 
to the truth. See Table 1 for the RR of On and ^ridgo used on both signals. In 
each case, for n = 50, the RR are approximately the same, with bridge having 
a slight edge. Every sample size thereafter shows substantial advantage of On 
over bridge, culminating with a factor of two improvement in RR after n — 300 
observations. 

For estimating ^smooth^ both estimators have substantial oscillations for low 
sample sizes. However, due to On having a soft-thresholding effect, some of the 
entries in our estimator of /? are zeroed out. In contrast, bridge only shrinks the 
coefficients and hence still has substantial fluctuations after n = 300 observa- 
tions. See Figure 3 for graphical results. 

For the signal QV'^^'^'^ ^ estimates the true height of the peaks accurately and 
quickly. In particular, the secondary small peak is definitively identified with the 
correct shape and height for n = 50 observations, while for ^ridgo, the secondary 
peak is much less clear. There are still some remaining oscillations at n = 300, 
resulting from unavoidable consequence of using the eigenvector basis. This is a 
well-known phenomenon in Fourier analysis known as the 'Gibbs effect.' Even 
with this obstacle. On converges quickly to 0^'^'^'^'^'^ . See Figure 4 for graphical 
results. 

5. Discussion 

In this paper, we provide a general method for recovering an unknown signal 
given a sequence of noisy observations that are only indirectly of that signal of 
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On ^ridgc 

Fig 3: Estimation of 6*™°°*^ by §n (left column) and ^idge (right column). The 
sample sizes range from top to bottom, n = 50, 100, 200, 300. Our estimator, 
0„, quickly converges to 6'"'°°°*^. However, 4idge, which doesn't zero out any 
coefficients, still has substantial fluctuations after n = 300 observations. See 
Table 1 for RR results for this simulation. 
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On ^ridgc 



Fig 4: Estimation of 0P''^^'^<^ by 0„ (left column) and ^ridgo (right column). The 
sample sizes range from top to bottom, n = 50, 100, 200, 300. Our estimator, 
0„, estimates the true height of the peaks accurately and quickly. In particular, 
the secondary small peak is definitively identified with the correct shape and 
height. There are still some remaining oscillations at n = 300, resulting from an 
unavoidable Gibbs effect from using the eigenvectors as a basis. See Table 1 for 
RR results for this simulation. 
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i?R(9Hdgc,«''™°°"') 




if,R(eHdgc,6'P™'"='^) 


n = 


50 


0.291 


0.288 


0.148 


0.151 


n = 


100 


0.210 


0.223 


0.116 


0.171 


n = 


200 


0.149 


0.199 


0.092 


0.149 


n = 


300 


0.120 


0.173 


0.079 


0.141 



Table 1 

The RR for the two considered simulations. These are estimated by averaging 100 runs of 

our simulations. 



interest. Our estimator, has many favorable properties. It has computational 
efficiency in the sense that it can be updated with a new observation without 
need to reference the entire sequence of observations. Instead, it relies on only 
a few summary statistics that need to be maintained and updated. Though its 
computation is predicated on finding the eigenvectors and eigenvalues of poten- 
tially large matrices, the implementation is straightforward and generalizable 
to higher dimensional signals such as images. Additionally, there exist accurate 
methods for the approximate computation of the eigenvectors of matrices that 
could in principle be used to speed up the computation of 4'. 

Also, On is statistically efficient as well. The uniform consistency and oracle 
inequality results show that it is making about as good a use of the data as pos- 
sible. Likewise, 0„ has worked very well in our experiments so far, as evidenced 
by the results in Figures 1, 3, and 4. Our estimator can recover the unknown 
signal 6 efficiently, requiring very few observations. Also, referring to the sec- 
ond row of Figure 1, the collection of a very poor observation merely doesn't 
improve the estimate instead of decreasing its quality. This is in opposition to 
many currently implemented techniques such as straight averaging, where low 
quality observations decrease the quality of the recovery. 

Appendix A 

This section gives warrant for assumption (A3) in Section 2. Although a slightly 
weaker version of assumption (A3) is all that is actually required (that only the 
right eigenvectors need be the same instead of both left and right eigenvectors) 
we leave it in its current form for simplicity of exposition and conditions. 

Two real matrices A, B share the same eigenvectors if they are simultaneously 
unitarily diagonalizable; that is, there exists two diagonal matrices Si,S2 and 
an unitary matrix such that A = ^E'SiVl'* and B = ^Yi2^* ■ Note A and B 
must of course be unitarily diagonalizable, which implies by the spectral theorem 
that A and B are normal; that is A^ A = AA^ and B^ B = BB^ . The following 
theorem characterizes simultaneous diagonalizability. 

Lemma 8. Let K, be a commuting family of normal matrices. Then JC is also 
simultaneously unitarily diagonalizable. 

Proof of Lemma 8. By the Schur unitary triangularization theorem (Horn and Johnson, 
1985, Theorem 2.3.1) if /C is a commuting family of matrices, then there is a 



Homrighausen and Genovese/Estimators for Sequential Inverse Problems 



19 



unitary such that 4* A' 4** is upper triangular for every K Cz IC. Hence, as nor- 
mahty is preserved under unitary congruence and a triangular normal matrix 
must be diagonal, the result follows. □ 

Though all Toeplitz matrices commute asymptotically as the number of rows 
and columns increases, not all Toeplitz matrices commute for a fixed size. Many 
subsets of the family of Toeplitz matrices satisfy Lemma 8, however. In par- 
ticular, all circulant matrices commute (Gray, 2001, Chapter 3.1). This shows 
Theorem 1. 



Appendix B 

We utilize the following notation in several of the below proofs. We use < to 
indicate 'less than or equal to up to a constant independent of n.' Also, it is 
convenient to think of a complex number a = ai + 021 as an element (oi, 02) € 
R^. In this case, we use | ||a|| p = a\+a\ as a norm on R^, as the complex modulus 
is not technically defined on elements of M^. Additionally, Z ^ N{0,l2) is the 
two dimensional standard normal. Lastly, we define s„j := ri^e^/A„j. 

We begin with a lemma that will be used in the proofs of Theorem 2 and 
Theorem 3: 

Lemma 9. Let /i € be a vector, E = diag{a1, cr|) be a diagonal matrix with 
positive entries, and be a real, positive constant. Then 

P(|||m + E^/'^||P <c2) <P(|||/, + a,„,,Z|||2 <c2) (29) 

«/ lll/^lll > c and a,nax = max{cri, CT2}- 

Here, we don't give a formal proof but provide intuition. The probability in 
equation (29) corresponds to the amount of the mass of an elliptical normal, 
aligned with the cononical axis, that resides in a ball of radius c at the origin. 
Hence, if |||^||| > c (that is, the mean is outside the ball) a more spread out the 
normal results in more mass inside the ball. 

Proof of Theorem 2. For simplicity, write /3„ :~ A(B,i). Then 

supi?„(0,0) = supi?„(/3„,^), 
eee pee 

where B -.^ {P : ||/3|p < T^} = 1'*e. Then we wish to show that 

lim sup sup i?„ (/3„ , /3) = , (30) 

where the subscript n on R has been included to emphasize the dependence on 
n. 

We begin by defining the following set 

A, :={^:|S„,(^)P>4.} 
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where lo ranges over the measure space on which the random variable Bnj is 
defined. The utihty of defining Aj is 



PnjlA, = 1 - 



A \B -12 



(31) 



Additionally, write Bnj ~ (ij + Znj as a mean term plus stochastic term, 
where Znj is the j*'' entry in the complex normal eA^^ ^i{Di'^*Wi). Then the 
following bound on the j*'' term in the loss holds: 



Bnj - /3j 



Znj 



' si,{Pj+Znj) 
\Pj + -^njP 



(32) 



To show that the expected value of the first term goes to zero in expectation, 
observe: 



EIZ 



I ''nj 



+2. ■ 



.2 



< (1 + 2f]„ + 



As < C < oo for n large enough for some C by assumption (A5), 



EIa, (I )^ - 0(1/ A„,) 



(33) 



uniformly in /3. 

For the second term, lA';|/3jP, we need to show 



limsupsup^P(A^)|/3j|2 = 0. 



(34) 
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First, wc compute the eigenvalue matrix A„j of the covariance matrix of Z, 
as a vector in . By the properties of complex normals^ 




Hence, the entries in A„j are ^nj,i = s^/^nj +^Qj ^^'^ ^nj,2 = ^^/^nj — ^Cjj, 
which are both strictly positive. Also, define U to be the associated eigenvector 
matrix. 



Though it is clear that P(Ap|/3jp goes to zero pointwise, the worst /3j is 
arbitarily close to zero. Hence, to show uniform convergence, we define a pa- 
rameter T^j. For each j, define Bj {/3j : < 7"^} and split this set into 
Bj = Bjn U i3|„, where 

6|„:={/3:4<|||/3,||p<n. 

Also, as 1 1 1 ' 1 1 1 is invariant under orthogonal operations, we can rotate everything 
by the eigenvectors U. Denote rotation by C/ by a tilde; that is, Pj := U (3j. Then, 



sup^P(A^^)|/3,f <^ sup P(A^^)|/?,f 



<^maxj sup Pft(A^^)|/3,f sup P(A^^)|/3j| 



p 



<^max<^T2^., sup P(Aj")|/3, 



p 



2 



= J2 max <^ 4-, sup P(|||[/(/3, + < 4,)lllft l 



Then continuing on with the second term in the max, and using Lemma 9 
with \ > Snj, which happens if > s^y, 

^Technically, this covariance matrix is ofE by a constant, but this is not relevant for our 
current purposes. 
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sup Fim+Ai/;z\\f <si^)m\\f 



< sup P(|||;3,+A„,ax^|||'<4)|||/3,|ir 

< sup 1-$ |pj/A,nax|||-S„j/A 

max I I I MM? I 

sup (1 - $ (1/Aniax(u - 

sup (l-«'(t))(A,„ax(t + Snj))^ 



|2 



\Lx sup (l-$(t))(i + S„,f 

S„i<t<^ S„i 



— '^max sup (1 - $(t))(t + 1)^ fornlarg e enouen 

0<t<oo 

< A2 

— max 

The last inquality follows by Figure 5 and by noting that (1 — + 1)^ 

continuous, unimodal, and bounded by 1. 



Fig 5: Plot of (1 - + 1)2 

Therefore 



sup P(A^=)|/3,f - max{r2, ALx} (3 



Hence, it is sufficient to choose r^^ = 2s^^ and to note that 

^max ^ ^ ^ I '-^nj ■ 
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This implies 



(36) 



As we are summing over j in the risk, we conclude that 



where 



limsupsup 7„ i?(/3„,/3) < oo 
Ti^oo pets 



7„ = max ■ 



□ 

Proof of Theorem 3. We use the same notations and conventions as in the proof 
of Theorem 2. Note that if we define tr^y- = e^/A„j, then the linear oracle risk 
is 



R{P,,P) = ^ min /?) = ^ 2 \i7y 

/3=A(B„) ~t'^nj + \Pj\ 

we bound the j*'' term in the loss: 
= 1a 



l/3j P^nj" 



j=l ^nj 



(37) 



+ 



l/3j 



+ 



+ 



1a; 1/3, 



1a, 



^rij I 



l/3j+Z„,f 



1a;I/3, 



nj 



1a; 1/3, 



l/3,+^n,f 



1/3, 



+ 1a; 1/3, 



< 1^. 



4,1/3,, 



n, I 



\z„ 



/ 4,I/3,P 



+ 1a; 1/3, 



4,+f^^l/3,P 



1/9, + ^„ 



+ lA;|/3,f 



By the previous proof, we see that the expected value of the first and third 
term go to zero uniformly over P & B at rate 0(1/A„,); the same rate as the 
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oracle. For the second term, notice that 



s: 



for n large enough, by assumption (A5). Then our goal is to show that 

lim sup sup ILGnj < oo . 

n->oo peB 

First, due to Gnj being rotationally symmetric (once we use ||| • ||| instead of 
I'D, we renormalize to transform Z„j into a vector Z with independent standard 
normal components 



|2 



^3 1 



3j + Znj IIP 



lC/T/3, 



|2 



|2 



1a, 



■aK^zi 



We define A„j and [/ in the previous proof and Aj ■= + A^^^Zlp > s^y. 
We break bounding EGnj into cases. 

Case 1: |||/3,|||2 < 4^- 

We see from the definition of 1^ 



Note that by the nonnegativity of G 



Case 2: |||/3||M > 



raj 



nOO 

/ P(G„j > r) dr. 

JQ 



As an aside, the random variables considered don't put any positive mass at any 
points, so we don't need to worry about whether the boundaries of integration 
are included. For r > 0. 



P{Gn,>r)^p[sl^<m + Alfz\\\'< 



T>mjiii 
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Therefore, for any > 0, 

/•OO 

EG,y = / P(G„, > t) dT 
Jo 



, 11131. 



|2 

„2 



< + P(G„, > c') 



<c^+(^']p(|||/3 + A,l{.^Z||P<''''^''''' 



If > 1, then the mean of of the random variable ^ + ^l/fz will be outside of 
the circle centered at zero with radius ||/3||/c. Hence, by Lemma 9 if we define 
'^max ■— max{diag(A„j)}, then it follows that 



< 



AmaX'^l 1 1 ^ 



(39) 



Using this, observe 



nj 



< 



< 



nj 



P IP/An 



1 - $ 



^|||2 . IP/An 



1-- ) lll/3/An 



1-$ 1 



lll/3/A„ 



n] 



1 - $ 



1-- u 



r 1 - $ 1 - 



Where we have transformed t ~ |||/?|||/Ai„ax- Hence, as sf^^ x A^^^^ and 
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we see that 



, P |||/3 + Aifz||p<^^) =0(1), 



independent of /3. And we conclude that 

EGnj - 0(1), 

again, independent of /?. This ends the proof. □ 
Proof of Theorem 4. Observe 

hm supE(^.),xJ|/3-/3||' = hm supE(c.)Ex„|(D.)ll/3 " 

< hm Em-) supi?(/3,/3). (40) 

n^oo * 

Therefore it suffices to exchange the hmit and integral To accomphsh this we 
appeal to the following bound from (32). Define /„ := sup^^g Ej!s:^|(£).-)||/3 — /3|p. 
Then 



= El/3.-/3.l 



< SUpEx„|(Di 

/3eB 



P 



J = l 
P 



Efj =• 5n- 

J = l 



Therefore, if we can exchange the limit and integral, then by the previous 
two proofs we can conclude that the limit is zero. We appeal to the following. 
We say a set of random variables {Xt : t € T} is uniformly integrable if 

lim supE\Xt\liXt\>x = 0. 
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It holds that ii Xt X with probabihty one and {Xt : t G T} is uniformly 
integrable, then MXt — >■ MX. Hence, we wish to show that /„ is uniformly 
integrable. It holds that if each term in the sum over j is uniformly integrable, 
then /„ is uniformly integrable as well. 
Note that 



E|/,|l|/^.|>, = xPif, >x)+ P(/, > y)dy 

J X 

poo 

< xP{g, >x)+ P{gj > y)dy 

J X 

For large x, x > T'^ and for large n, i7„ x 1. Therefore, we only need deal with 
the term e^/A,y. 

Using assumption (B4), continuing the above with relevant terms, and notic- 
ing that sup„ /„ occurs at n = 1, it follows that for x large enough 



> x] + 



> y] dy^x 



dy 



xP- 



yP 



0. 



This allows for the exchange of integration end hence shows the desired result. 

□ 

Proof of Proposition 6. We can expand (11) for any A(B„) € 5 as 



(A, -If 1/3,1^ + ?^ 



(41) 



i?(A) i?MA(B„)) = ^ 

To form an estimator of i?, wc notice that E^^. (|i?„j p — e^/A„j) = |/3jp. Hence, 

=■2 



R{\) := ^ 



(A,-l)^ 



(42) 



is an unbiased estimate of R{X). We can make a substitution 



which produces 



R{X) = j2[ih ~ hf\Bn 



p 



(43) 



Finally, note that the second term in R doesn't depend on A, so it can be ignored 
for minimization purposes. Define 



i?„(A) :=^(A,-^,f |B„,.| 



(44) 
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which is proportional to R{X). This is our objective function for formulating 
estimators. 

However, there are some natural restrictions. First, define C := [0, 1]^. If 
we consider a transformed version of (41) by making the substitution ipj := 

|/3,f/(|/3,f + A„,),then 



A, 



(45) 



By inspection, the minimizer of (45) falls in C as ipj G [0, 1] for each j. Hence, 
we cannot get a lower risk by considering any more general sets and thus confine 
our attention to A G £. 

□ 

Proof of Theorem 7. Direct computation shows that 

R,=minj2i^-X,r\B,\'+s'j2^ 

j=i ]=i 



and 

i?.^min£(l-A,)^|i?,p4X:^. 



This implies that 



and 



|2 



nj to n\Dn\'j. 



Hence, the results reduces to comparing A„j to n\Dn\'^j. Note 



l,q 



therefore 



n 
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Observe 

1 " 
n|I?„|2 - A„, ^ D*^D,, ~ J2 1^ 

i,q i—1 



2 



n 

71. 

< - (n - 1) ^ I A,f + Ed-D'^^ l' + l')/2 



(n-l)^|A,f + (n-l)^|A,f -0 



4=1 'i=l 

where the last mequahty follows as |Aj||-Diq| < + \Dqj\'^)/2 by the 

arithmetic geometric inequality. □ 
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